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Abstract: In this paper we have implemented Radix 8 High Speed Low Power Binary Multiplier using 
Modified Gate Diffusion Input (M.G.D.I) technique. Here we have used "Urdhva-tiryakbhyam"( 
Vertically and crosswise ) Algorithm because as compared to other multiplication algorithms it shows 
less computation and less complexity since it reduces the total number of partial products to half of it. 
This multiplier at gate level can be design using any technique such as CMOS, PTL and TG but design 
with new MGDI technique gives far better result in terms of area, switching delay and power 
dissipation. The radix 8 High Speed Low Power Pipelined Multiplier is designed with MGDI technique 
in DSCH 3.5 and layout generated in Microwind tool. The Simulation is done using 0.12/im technology 
at 1.2 v supply voltage and results are compared with conventional CMOS technique. Simulation result 
shows great improvement in terms of area, switching delay and power dissipation. 
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I. Introduction 

The majority of the real life applications mainly in microprocessors and digital signal processors 
require the computation of the multiplication operation [1]. Specifically speed, area and power efficient 
implementation of a multiplier is a very challenging problem. Multipliers are the main building block of many 
high speed and performance systems such as FIR filters, microprocessors, and digital signal processors. The 
performance of digital system is generally evaluated by the performance of the multiplier. In such applications, 
low power consumption is also a critical design issue. 

Power dissipation in CMOS circuits [2] is caused by three main sources: 1) the charging and 
discharging of capacitive loads due to change in input logic levels. 2) the short-circuit current arises because of 
the direct current path between the supply rails during output transitions and 3) the leakage current which is 
determined by the fabrication technology, consist reverse bias current in the parasitic diodes formed between 
source and drain diffusions and the bulk region in a transistor as well as the sub threshold current that arises 
from the inversion charge that exists at the gate voltages below the threshold voltage, The short- circuit and 
leakage currents in CMOS circuits can be made small with proper device and circuit design techniques. The 
dominant source of power consumption is the charging- discharging of the node capacitances and it can be 
minimizing by reducing switching activity of transistors. Switching activity of the digital circuits is also a 
function of the logic style used to implement the circuit. At circuit/logic level [2], different CMOS logic design 
techniques like CMOS complementary logic, Pass Transistor Logic, Pseudo nMOS, Cascade voltage switch 
logic , Dynamic CMOS, Clocked CMOS logic , CMOS Domino logic, Modified Domino logic and 
transmission gate logic (TG) have been proposed to reduce power consumption. The new MGDI technique 
called modified gate diffusion input technique allows solving most of the problems occurring in above 
mentioned various CMOS and PTL techniques. The MGDI technique compared to other techniques allows 
reduced power dissipation, lower time delay, lower count of transistors and area of digital circuits while 
maintaining reduced complexity of circuit logic. 

In this paper, we designed low power, fast processing radix 4 Pipelined Multiplier for 2, 4 and 8 bit 
multiplication using MGDI technique that has advantages of minimum transistors required, more speed and 
low power dissipation as compare to conventional CMOS techniques. The organization of this paper is as 
follows: Section II, explains the details of "Urdhva-Tiryakbhyam" i.e. vertically and crosswise Multiplication 
Algorithm for 2 bit ,4 bit and 8 bit Multiplication.. Section III, explains MGDI technique and its performance 
analysis for basic digital gates. Section IV, presents the implementation of radix-4 Pipelined multiplier using 
MGDI in DSCH 3.5 and MICROWIND Tool. At the end, the conclusion and Acknowledgement is presented 
in section V & VI. 
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II. Vedic Multiplication Algorithm 

The multiplier is based on an algorithm called Urdhva Tiryakbhyam (Vertical & Crosswise) of 
ancient Indian Vedic Mathematics [4]. Urdhva Tiryakbhyam Algorithm is a basic multiplication principle 
applicable to all multiplication cases. It literally means 'crosswise and vertically' multiplication. It is based on 
a unique concept through which the partial products generation can be done with the simultaneous addition of 
these partial products. 

The Pipelining in generation of partial products and their summation is obtained using Urdhava 
Triyakbhyam Algorithm. This algorithm can be generalized for N x N number of bits. Since the partial 
products and their sums are calculated in parallel, the multiplier is independent of the frequency of the clock 
used in processor. Thus the multiplier will require the same amount of calculation time for the product and 
hence it is independent of the clock frequency. The overall advantage is that it reduces the need of 
microprocessors to operate at higher clock frequencies. While a higher clock frequency results in increased 
processing power, its disadvantage is that it increases power dissipation which can cause higher device 
temperature of operations. By employing the pipelined multiplication, microprocessors designers can easily 
avoid these problems to avoid severe device failures. 

The processing efficiency of multiplier can easily be increased by expanding the input and output data 
bus widths since it has a simple structure. Due to its simple structure, it can be easily laid out in a silicon chip. 
This Multiplier has the advantage that as the number of bits are increased, time delay and the area increases 
steadily as compared to other types of multipliers. Therefore it is time, area and power efficient. It can also be 
observed that this architecture is most efficient in terms of silicon area/speed. 



III. IMPLEMENTATION OF 2x2 BIT MULTIPLIER 

The method for two bit multiplication can be explained by, Considering two 4 bit numbers A and B 
where A = A1A0 and B = B1B0 as shown in Figure 1, Firstly, the lowest bits are multiplied which gives the 
Least Significant Bit (LSB) of the final product vertically. Then, the lowest bit of the multiplicand is multiplied 
with the next higher bit of the multiplier and added with, the product of LSB of multiplier and next higher bit 
of the multiplicand in crosswise manner. This sum gives second bit of the final product and the carry is added 
to the partial product obtained by multiplying the most significant bits to give the sum and carry. This sum is 
the third corresponding bit and carry becomes the fourth bit of the final product: 

5*0 = A0A0 (1) 

C1S1 = A150 + A0B1 (2) 

C2S2 = C\+A1B1 (3) 
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Figure 1 : Block diagram of 2x2 bit Multiplier 



The final result will be clslslsO. This multiplication method is applicable for all the cases. The 2x2 
bit multiplier module is implemented using four input AND gates & two half-adders as displayed in the block 

diagram in Fig 1 

IV. Implementation Of 4x4 Bit Multiplier 

For higher number of bits in input, little modification is required. Divide the no. of bits in the inputs 
equally in two parts. In 4x4 bit multiplication, say multiplicand A=A3A2A1A0 and multiplier B= B3B2B1B0. 
Following is the output sequence for the multiplication result, S7S6S5S4S3S2S1S0. 

Let's divide A and B into two parts, say — 'A3 AT & 'Al AO' for A and 'B3 B2' & 'B1B0' for B. 
Using the basics of Vertical and Crosswise Multiplication, and considering two bit at an instant and using two 
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numbers of two bit multiplier Section, we can obtain the following arrangement for 4x4 bit multiplication as 
shown in Figure 2. 
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X B3B2B1B0 
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Figure 2:4X4 Multiplication 
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Each block as shown above is 2x2 bit multiplier. First 2x2 multiplier inputs are "Al AO" and "Bl 
BO". The last block is 2x2 bit multiplier with inputs "A3 A2" and "B3 B2". The middle one shows two, 2x2 bit 
multiplier with inputs "A3A2" & "B1B0" and "A1A0" & "B3B2". So the final result" of multiplication, which 
is of 8 bit, "S7S6S5S4S3S2 SlS0".the block diagram of 4x4 bit Vedic Multiplier is shown in Figure 3. To get 
final product S7S6S5S4S3S2S1S0 four, 2-bit multipliers and three 4-bit Ripple Carry (RC) Adders are 
required. Here, the first 4-bit RC Adder is used to add two 4-bit operands obtained from cross multiplication of 
the two middle 2x2 bit multiplier modules. The second 4-bit RC Adder is used to add two 4-bit operands, i.e. 
concatenated 4-bit two grounded inputs & most significant two output bits at right hand most 2x2 multiplier 
block as shown in Figure 3 and one 4-bit operand we get as the output sum of first RC Adder. Its carry i.e. cal 
is forwarded to third RCA. Now the third 4-bit RCA is used to add two 4-bit operands, i.e. concatenated 4 -bit 
(carry cal, "0" & most significant two output sum bits of 2nd RC Adder and one 4 -bit operand we get as the 
output sum of left hand most of 2x2 multiplier module. The arrangement of Ripple Carry Adder as shown in 
Figure 3 helps us to reduce delay 
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V. Implementation Of 8x8 Bit Multiplier 

Algorithms for 8 X 8 Bit Multiplication Using Urdhva Triyakbhyam (Vertically and crosswise) for 
two Binary numbers: 

A = A7A6A5A4 A3A2A1A0 

XI XO 
B = B7B6B5B4 B3B2B1B0 
Yl YO 
XI XO 
* Yl YO 



FEDC 
CP = XO * YO = C 
CP = XI * YO + XO * Yl = D 
CP = X1 * Yl =E 
F= CARRY OVERFLOW 
Where CP = Cross Product. 
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Figure 4: 8 X 8 Multiplication Block Diagram 

The 8x 8 bit multiplier is structured using 4X4 bit blocks as shown in figure 4.7. In this figure the 8 
bit multiplicand A can be decomposed into pair of 4 bits AH-AL. Similarly multiplicand B can be decomposed 
into BH-BL. The 16 bit product can be written as: 

P = A x B= (AH-AL) x (BH-BL) 
= AH x BH+AH x BL + AL x BH+ AL x BL 



The outputs of 4X4 bit multipliers are added accordingly to obtain the final product with the help of 
three ripple carry adders. Now the basic building block of 8x8 bits Vedic multiplier is 4x4 bits multiplier 
which implemented in its structural model. For bigger multiplier implementation like 8x8 bits multiplier the 
4x4 bits multiplier units has been used as components which are implemented in DSCH3.5 and MICRO WIND 
3.1 and the structural modelling of above design shows fastest design 
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VI. Mgdi Logic Design Technique 

First the GDI basic cell was introduced by Arkadiy Morgenshtein in 2002 [5]. The basic GDI cell 
(figure 2) contains one nMOS and one pMOS transistors with four terminals: G, P, N and D. Input G is the 
common gate input of nMOS and pMOS transistors, input P is the outer diffusion node of pMOS transistor, 
input N is the outer diffusion node of MOS transistor, and output D is the common output diffusion node of 
both transistors. The GDI primitive cells are designed in twin-well CMOS or silicon on insulator (SOI) 
technologies. 




O Out 



Figure. 5. Basic GDI cell 



With few improvements in GDI technique, paper [8] presented MGDI cell for all basic gates with 
minimum power dissipation. Figure 3 shows the design of MGDI basic gates for inverter, 2 input AND, OR, 
NAND, NOR, and 3 transistor XOR gates. The operation of OR gate is described here. For OR gate, the source 
of pMOS is connected with input "B" and the source of nMOS is connected with input "A". The gate terminal 
G is connected with "A". When both the inputs are at low level then pMOS will operates in linear whereas 
nMOS is cut-off. When A is at high and B is at low level then pMOS is in linear region and nMOS is in linear 
region thereby producing the output as 1 . Similarly for A at low level and B is at high level then pMOS is in 
linear and nMOS is also in linear region again producing the output as 1 Similarly when A and B both are at 
high level , then pMOS and nMOS are again in linear region thereby producing the output as 1 . 
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Figure. 6 Basic Digital Gates using MGDI technique 



I IJMER I ISSN: 2249-6645 I 



www.ijmer.com 



I Vol. 4 1 Iss. Ill Nov. 2014 1 111 



Design And Implementation Of 8 Bit Multiplier Using M. G.D.I. Technique 



The comparative performance analysis [8] of MGDI, and CMOS logic is presented in Table 1. The 
comparative performance is done with respect to switching delay, transistor count and average power 
consumed by MGDI, and CMOS logic. The Comparative Table No.l shows that the MGDI performance is 
better when compared to CMOS logic. CMOS technique uses double number of transistor compare to MGDI to 
realize any digital gates. The transistors used to design XOR and XNOR has only three transistors in MGDI 
whereas CMOS logic uses eight transistors. 

VII. Implementation Of Pipelined Multiplier Using MGDI Cell 

Binary Pipelined Multiplier for 2x2 bit 4x4 and 8x8 bit multiplication is designed using MGDI cell. 
First Basic gates, Half Adder, Full Adder and Ripple Carry Adder are designed using MGDI cell in 
MICRO WIND & DSCH tools with 0.1 2 urn technology with 1.2v supply voltage. The W/L ratio of both nMOS 
and pMOS transistors are taken as 1.0/0. 12 urn. To establish an unbiased testing environment, the simulation 
of multiplier designs have been carried out using comprehensive input bits, which covers every possible 
transition for multiplier and multiplicand bits. The cell delay are been measured from the instant inputs reach 
50% of the voltage supply level to the instant the latest of the output signals reach the same voltage level. All 
the transitions from an input bit combination to another have been tested and the delay at each transition has 
been measured. The average has been considered as the cell delay. The power dissipation of multiplier is also 
measured for these input patterns and its average power has been reported. 
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Figure 9 Layout of MGDI 4X4 Multiplier Figure 10 Layout of MGDI 8X8 Multiplier 
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VIII. Simulation Results 

Schematic of Multipliers are designed using DSCH 3.5 VLSI CAD tool and simulation is done while 
the Layout is generated using MICRO WIND 3.1. For 8-bit multiplication, figure 9 shows simulation result for 
input multiplicand bit (19)i 0 = (0000101 1) 2 and multiplier bit (11) 10 = (0001001 1) 2 and its output is (209)i 0 = 
(0000000011010001) 2 
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Table 3 

ANALYSIS OF 8 X 8 MULTIPLIER 
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Figure 11 Simulation result (19)i 0 x (ll)i 



IX. Conclusion 

This paper has presented the architecture design, logic design and circuit implementation of 8 Bit 
Pipelined Multiplier. The objective for Area, delay and power in Multipliers was carried out for bit 
multiplication using CMOS and MGDI techniques and Comparison with CMOS Technique are shown in 
Table 1, Table 2 and Table 3. The Pipelined Multiplier with MGDI technique gives less delay and less power 
dissipation with higher-speed of operation as compared to CMOS Technique. 
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